Human-centric Video Understanding with Weak Supervision a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
نویسنده
چکیده
A large fraction of videos such as entertainment, sports and surveillance videos are centered around people. We need efficient ways to index such content, i.e., understand and describe people: Who are they? What are their roles? What are their actions and intentions? One major challenge is that, training computer vision models for these tasks typically requires extensive spatial and temporal annotations. Such annotations are often very expensive and difficult to collect at the scale of thousands of videos. We could handle this problem by learning from weakly labeled videos, which are readily available and cheaper to collect. However, in such videos the person-labels are not spatially/temporally localized. In this thesis, we will present models which can learn from weakly labeled videos by automatically aligning the labels with the right people in the video to identify their (i) names (ii) social roles and (iii) actions. In the first part of this thesis, we consider the problem of identifying the names of people in weakly labeled videos. In particular, we deal with one widely available source of weakly labeled videos in the form of TV episodes. These videos are only accompanied by TV-scripts, which provide a noisy description of the characters appearing in different parts of the episodes. The descriptions are often not well aligned with the video, making the task more challenging. Further, people in the script are not only mentioned by name but also by pronouns such as “he”, “she” and nominals such as “doctor”, “teacher” etc. This adds to the ambiguity in aligning human mentions in the script with their actual appearance in the video. We address these problems by proposing a joint optimization framework for resolving name references in the text (coreference resolution) and name assignments in video. This joint model leads to better performance in both tasks and is evaluated on a dataset of 19 TV-episodes.
منابع مشابه
Supporting Effective Interaction with Tabletop Groupware a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
.............................................................................................................iv Acknowledgments..............................................................................................vi
متن کاملGaze-enhanced User Interface Design a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
........................................................................................................ iv Acknowledgments ..................................................................................... vi
متن کاملStructuring Peer Interactions for Massive Scale Learning a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
....................................................................................................................... iv Acknowledgments ........................................................................................................ vi Table of
متن کاملIncorporating Uncertainty in Data Management and Integration a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
متن کامل
Simulation-based Search for Hybrid System Control and Analysis a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
متن کامل
Haptics and Physical Simulation for Virtual Bone Surgery a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
......................................................................................................... iv Acknowledgments .......................................................................................... vi
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016